Search CORE

4 research outputs found

Recommended from our members

Learning and Improving Policies for Probabilistic Planning Problems

Author: Issakkimuthu Murugeswari
Publication venue: 'Oregon State University'
Publication date
Field of study

In this work, we study the problem of learning and improving policies for probabilistic planning problems. In the first part, we train neural network policies for probabilistic planning problems modeled as factored Markov decision problems. The objective is to train problem-specific neural networks via supervised learning to imitate the action choices of expert planners. In the second part, we focus on the problem of online policy improvement, where we try to improve on a given base policy via online search. Since search trees for these problems tend to be huge, in practice, action branches need to be pruned, which can affect policy improvement adversely. We formalize this notion by introducing the choice function framework and establish sufficient conditions on actions expanded in search trees for guaranteed policy improvement. In the next part, we draw attention to the fact that theoretical guarantees of policy improvement can fail when the ideal conditions assumed in theory do not hold in practice. We propose benchmark problems, baselines and metrics to assess the empirical performance of online policy improvement algorithms. In the final part, we focus on approximation via state aggregation in MDPs and study the theoretical guarantees of several aggregation schemes

ScholarsArchive@OSU

The Choice Function Framework for Online Policy Improvement

Author: Fern Alan
Issakkimuthu Murugeswari
Tadepalli Prasad
Publication venue
Publication date: 07/10/2019
Field of study

There are notable examples of online search improving over hand-coded or learned policies (e.g. AlphaZero) for sequential decision making. It is not clear, however, whether or not policy improvement is guaranteed for many of these approaches, even when given a perfect evaluation function and transition model. Indeed, simple counter examples show that seemingly reasonable online search procedures can hurt performance compared to the original policy. To address this issue, we introduce the choice function framework for analyzing online search procedures for policy improvement. A choice function specifies the actions to be considered at every node of a search tree, with all other actions being pruned. Our main contribution is to give sufficient conditions for stationary and non-stationary choice functions to guarantee that the value achieved by online search is no worse than the original policy. In addition, we describe a general parametric class of choice functions that satisfy those conditions and present an illustrative use case of the framework's empirical utility

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Training Deep Reactive Policies for Probabilistic Planning Problems

Author: Fern Alan
Issakkimuthu Murugeswari
Tadepalli Prasad
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 15/06/2018
Field of study

State-of-the-art probabilistic planners typically apply look- ahead search and reasoning at each step to make a decision. While this approach can enable high-quality decisions, it can be computationally expensive for problems that require fast decision making. In this paper, we investigate the potential for deep learning to replace search by fast reactive policies. We focus on supervised learning of deep reactive policies for probabilistic planning problems described in RDDL. A key challenge is to explore the large design space of network architectures and training methods, which was critical to prior deep learning successes. We investigate a number of choices in this space and conduct experiments across a set of benchmark problems. Our results show that effective deep reactive policies can be learned for many benchmark problems and that leveraging the planning problem description to define the network structure can be beneficial

Association for the Advancement of Artificial Intelligence: AAAI Publications

Hindsight Optimization for Probabilistic Planning with Factored Actions

Author: Fern Alan
Issakkimuthu Murugeswari
Khardon Roni
Tadepalli Prasad
Xue Shan
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 08/04/2015
Field of study

Inspired by the success of the satisfiability approach for deterministic planning, we propose a novel framework for on-line stochastic planning, by embedding the idea of hindsight optimization into a reduction to integer linear programming. In contrast to the previous work using reductions or hindsight optimization, our formulation is general purpose by working with domain specifications over factored state and action spaces, and by doing so is also scalable in principle to exponentially large action spaces. Our approach is competitive with state-of-the-art stochastic planners on challenging benchmark problems, and sometimes exceeds their performance especially in large action spaces

CiteSeerX

Association for the Advancement of Artificial Intelligence: AAAI Publications